Design and development of an ancient Chinese document recognition system
Identifieur interne : 001641 ( Main/Exploration ); précédent : 001640; suivant : 001642Design and development of an ancient Chinese document recognition system
Auteurs : Liangrui Peng [République populaire de Chine] ; Pingping Xiu [République populaire de Chine] ; Xiaoqing Ding [République populaire de Chine]Source :
- SPIE proceedings series [ 1017-2653 ] ; 2004.
Descripteurs français
- Pascal (Inist)
- Wicri :
- topic : Numérisation.
English descriptors
Abstract
The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000530
- to stream PascalFrancis, to step Curation: 000260
- to stream PascalFrancis, to step Checkpoint: 000493
- to stream Main, to step Merge: 001704
- to stream Main, to step Curation: 001641
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Design and development of an ancient Chinese document recognition system</title>
<author><name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">04-0471355</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0471355 INIST</idno>
<idno type="RBID">Pascal:04-0471355</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000530</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000260</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000493</idno>
<idno type="wicri:doubleKey">1017-2653:2004:Peng L:design:and:development</idno>
<idno type="wicri:Area/Main/Merge">001704</idno>
<idno type="wicri:Area/Main/Curation">001641</idno>
<idno type="wicri:Area/Main/Exploration">001641</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Design and development of an ancient Chinese document recognition system</title>
<author><name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint><date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Ancient chinese</term>
<term>Chinese</term>
<term>Digitizing</term>
<term>Optical character recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Numérisation</term>
<term>Chinois</term>
<term>Reconnaissance optique caractère</term>
<term>Ancien chinois</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
</country>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
</noRegion>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001641 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001641 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:04-0471355 |texte= Design and development of an ancient Chinese document recognition system }}
This area was generated with Dilib version V0.6.32. |